Skip to content

Conversation

@kpjeeja
Copy link

@kpjeeja kpjeeja commented Dec 4, 2025

What?

Describe what this PR is doing.

Why?

Justification for the PR. If there is an existing issue/bug, please reference it. For
bug fixes, the 'Why?' and 'What?' can be merged into a single item.

How?

It is optional, but for complex PRs, please provide information about the design,
architecture, approach, etc.

@kpjeeja kpjeeja force-pushed the jeeja_test_debug branch 9 times, most recently from 6187f49 to 4a4aa7d Compare December 19, 2025 10:44
@kpjeeja kpjeeja force-pushed the jeeja_test_debug branch 5 times, most recently from 942ad31 to abde8e5 Compare December 29, 2025 15:53
jeejakp12 and others added 4 commits January 13, 2026 17:15
…ynamo#908)

This change:
 - passes the number of rail endpoints to others via serialization,
 - added all remote endpoints for each local rails into addresses,
 - round-robin pick local rails and remote endpoints for data transfer.

Signed-off-by: Feng Ji <[email protected]>
* libfabric: Fix creation of localMD

When creating MD for local operations populate the remote selected
endpoints array.
Remove dedicated code for local operations on the same agent process to
use general flow now that localMD is created correctly

Signed-off-by: Amit Radzi <[email protected]>

* libfabric: Refactor load MD functions

Create a helper function for the common part of loadLocalMD and
loadRemoteMD

Signed-off-by: Amit Radzi <[email protected]>

---------

Signed-off-by: Amit Radzi <[email protected]>
Co-authored-by: Adit Ranadive <[email protected]>
It crashed when running with nixl log at DEBUG level.

Signed-off-by: Feng Ji <[email protected]>
Co-authored-by: Mikhail Brinskiy <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants